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Abstract 

This  paper  describes  an  efficient  technique  for  estimating,  via  simulation,  the  prob¬ 
ability  of  buffer  overflows  in  a  queueing  model  that  arises  in  the  analysis  of  ATM 
(Asynchronous  Transfer  Mode)  communication  switches.  There  are  multiple  streams 
of  (autocorrelated)  traffic  feeding  the  switch  that  has  a  buffer  of  finite  capacity.  Each 
stream  is  designated  as  either  being  of  high  or  low  priority.  When  the  queue  length 
reaches  a  certain  threshold,  only  high  priority  packets  are  admitted  to  the  switch’s 
buffer.  The  problem  is  to  estimate  the  loss  rate  of  high  priority  packets.  An  asymptoti¬ 
cally  optimal  importance  sampling  approach  is  developed  for  this  rare  event  simulation 
problem.  In  this  approach,  the  importance  sampling  is  done  in  two  distinct  phases.  In 
the  first  phase,  an  importance  sampling  change  of  measure  is  used  to  bring  the  queue 
length  up  to  the  threshold  at  which  low  priority  packets  get  rejected.  In  the  second 
phase,  a  different  importance  sampling  change  of  measure  is  used  to  move  the  queue 
length  from  the  threshold  to  the  buffer  capacity. 
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1  Introduction 


This  paper  is  concerned  with  efficient  simulation  techniques  for  estimating  very  low  packet 
loss  rates,  e.g.  less  than  10-9,  in  models  of  ATM  (Asynchronous  Transfer  Mode)  commu¬ 
nications  switches.  Such  low  packet  loss  rates  are  deemed  a  requirement  in  ATM  networks 
and  there  is  currently  great  interest  in  quantitative  methods  for  analyzing  the  performance 
of  ATM  models.  Standard  simulation  of  such  rare  events  is  known  to  take  a  prohibitive 
amount  of  time.  Therefore,  research  has  focused  on  how  to  apply  a  general  technique  called 
importance  sampling  [23,  25]  so  as  to  speed  up  the  simulation  of  such  rare  events.  A  survey, 
with  numerous  references,  on  the  application  of  importance  sampling  to  rare  event  simu¬ 
lation  of  models  in  queueing  and  reliability  theory  is  given  in  [26],  while  a  survey  on  fast 
simulation  of  reliability  models  is  given  in  [32].  Relevant  references  on  the  use  of  importance 
sampling  in  queueing  models  include  [1,  2,  3,  17,  18,  19,  22,  28,  29,  33,  34,  35,  37]. 

In  this  paper,  we  consider  simulation  of  a  single  switch,  the  architecture  and  control  of 
which  has  been  proposed  for  use  in  ATM  networks  [15,  16].  The  switch  operates  in  discrete 
time.  It  is  fed  by  K  (K  >  1)  independent  streams  of  traffic.  Each  stream  is  designated  as 
being  either  a  high  priority  or  low  priority  stream.  The  switch  can  service  up  to  c  (c  >  1) 
packets  in  each  unit  of  time.  The  switch  has  a  buffer  of  size  B  =  Bj  +  B2.  Whenever 
the  queue  length  (buffer  contents)  is  less  than  Bj,  then  both  high  and  low  priority  traffic 
streams  are  accepted  by  the  switch.  However,  when  the  queue  length  is  between  B\  and 
B\  +  B2,  low  priority  arrivals  are  rejected  and  only  high  priority  arrivals  are  admitted  to 
the  buffer.  When  the  queue  length  exceeds  B\  +  B2,  both  low  and  high  priority  packets  are 
rejected.  We  will  consider  estimating  the  loss  rate  of  high  priority  packets. 

In  order  to  model  realistic  situations,  it  is  necessary  to  consider  the  case  when  the  arrival 
streams  contain  autocorrelation,  as  may  occur  in  the  transfer  of  video  data.  Although 
somewhat  more  general  arrival  processes  can  be  handled,  for  simplicity  of  notation,  we  will 
consider  the  case  when  each  arrival  stream  is  governed  by  a  Markov  chain.  If  the  number  of 
sources  K  is  large,  and  if  Bi  -I-  B2  is  large,  then  exact  analysis  of  this  model  is  numerically 


infeasible  due  to  the  problem  of  state  space  size  explosion.  While  a  “fluid”  approximation 


to  this  “cell”  based  model  can  be  formulated  and  solved  (if  K  is  not  too  large)  [15,  16], 
the  difference  between  performance  measures  of  the  fluid  and  cell  models  is  neither  well 
understood  nor  easily  quantified.  (Indeed,  one  application  of  this  paper  is  to  facilitate  such 
an  understanding.) 


Fast  simulation  of  such  a  switch  without  priorities  (i.e.,  all  packets  are  high  priority 
and  B2  =  0)  was  considered  in  [9,  10].  In  those  papers,  the  asymptotically  optimal  (as 
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B  -*  oo )  importance  sampling  change  of  measure  was  described.  (Here,  asymptotically 
optimal  means  that  the  variance  of  the  estimate  goes  to  zero  at  the  fastest  possible  rate.) 
This  change  of  measure  is  closely  related  to  the  Gartner-Ellis  theorem  and  the  theory  of  large 
deviations  for  Markov  additive  processes  [3,  6,  7,  13,  20,  30,  36).  The  relationship  between 
“effective  bandwidths”  (see,  e.g.,  (8,  14,  21,  24,  27,  39])  and  asymptotically  optimal  fast 
simulation  is  also  explored  in  [9,  10].  (The  theory  of  effective  bandwidths  deals  with  the 
rate  at  which  large  queue  length  probabilities  decay.) 

In  this  paper,  we  show  how  the  results  of  [9,  10]  can  be  extended  to  the  switch  with 
priorities.  The  extension  is  unusual  in  several  respects.  First,  it  requires  piecing  together 
two  different  importance  sampling  changes  of  measure:  the  first  involving  both  high  and  low 
priority  packets  so  as  to  bring  the  queue  length  up  to  level  B\ ,  the  second  involving  only  the 
high  priority  packets  so  as  to  increase  the  queue  length  from  B\  to  B\  +  B2.  Assuming  that 
the  queue  is  stable  when  fed  with  both  high  and  low  priority  traffic,  both  of  these  events 
are  rare  and  need  to  be  simulated  differently.  Second,  additional  care  has  to  be  taken  in 
this  second  phase,  since  once  the  system  reaches  B\y  it  may  hover  there  for  some  period 
before  ascending  to  B\  +  B2  (assuming  it  does  reach  the  higher  level).  In  this  paper,  we 
describe  an  asymptotically  optimal  (as  both  B\  and  B2  — ►  00)  simulation  algorithm  for  this 
problem. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  the  general  technique  of 
importance  sampling  and  its  application  to  the  single  priority  switch  is  reviewed.  In  Section 
3  we  describe  how  to  extend  this  approach  to  the  switch  with  priorities.  Experimental  results 
are  given  in  Section  4.  The  paper  is  then  summarized  in  Section  5. 

2  Background 

To  demonstrate  the  difficulty  involved  in  simulating  rare  events,  consider  a  simple  example 
of  estimating  the  probability  7 b  of  an  event  71b  that  becomes  rarer  and  rarer  as  B  — ►  00, 
i.e.,  lima-.oo  7b  =  0.  Suppose,  for  simplicity,  that  7 b  can  be  written  as 

IB  =  J  l{xtTiB}P(x)dx  =  C1) 

where  the  subscript  p  denotes  sampling  from  the  given  input  density  p  and  is  the 

indicator  of  the  rare  set  ftp,  i.e.,  1{X€72B}  =  1  if  x  €  TZb  and  1{*€7ZB}  =  0  if  x  £  71b- 
The  standard  simulation  estimate  of  7 b  is  7b  =  (l/A) Y^.=\  Ai  where  X\,...,Xn  are 
sampled  from  the  density  p  and  /„  =  l{xneKa}-  Then  Ep[7b]  =  7  and  the  variance  of  7b 
is  7b(1  -  1b)/N .  The  relative  error,  defined  to  be  the  standard  deviation  divided  by  the 
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expected  value,  is  proportional  to  l/v^B^V  which,  for  any  fixed  sample  size  N,  approaches 
oo  as  B  — »  oo.  This  means  that  very  large  sample  sizes  are  required  to  achieve  accurate 
estimates  of  7b  as  B  — *■  oo. 

Importance  sampling,  when  applied  properly,  can  avoid  this  problem.  Consider  the 
integral  representation  for  7 b  in  Equation  1.  Multiplying  and  dividing  the  integrand  by 
another  density  function  p'(x)  yields 

1  *  =  J  1i^nB)^jP,(x)dx  =  Epi  =  EAl{XznB)L{X)\  (2) 

where  L(x)  =  p(x)/p'(x)  is  called  the  likelihood  ratio  and  the  subscript  pf  denotes  sampling 
from  the  density  p'.  Equation  2  is  valid  so  long  as  pf{x)  >  0  for  all  x  €  72b  such  that 
p(x)  >  0.  Importance  sampling  is  based  on  Equation  2  and  can  be  described  as  follows. 
Draw  N  samples  Xif...,X] v  using  density  p>  and  define  Zn  =  LnI„  where  L„  =  L(Xn). 
Then,  by  Equation  2,  Ep>[Zn ]  =  7 g-  Thus  an  unbiased  (and  strongly  consistent  a &  N  -*  00) 
estimate  of  7 g  is  given  by 

w(p')=ii;z»=4i;/.in,  (3) 

n=l  n=l 

i.e.,  7 b  can  be  estimated  by  simulating  a  random  variable  with  a  different  density  and 
unbiasing  the  output  by  multiplying  by  the  likelihood  ratio.  Most  papers  on  importance 
sampling  deal  with  how  to  choose  a  change  of  measure  (in  this  case  the  density  p'(x))  so 
as  to  obtain  variance  reduction.  In  general,  importance  sampling  must  be  applied  with 
great  care,  since  there  are  examples  when  it  increases  the  variance,  and,  in  some  cases,  it 
can  even  make  the  variance  infinite.  However,  variance  reduction  is  obtained  by  making 
the  likelihood  ratio  small  on  72g,  which,  roughly  speaking,  is  accomplished  by  making  j/(x) 
large  for  x  €  72b >  i.e.,  by  picking  p'  so  as  to  make  the  rare  event  Tig  likely  to  happen. 
Suppose  there  are  constants  d\  and  dj  and  a  function  f(B)  such  that  f(B)-*0asB—*oo 
and 

dxf(B)  <  L(X)  <  d2f(B)  (4) 

for  all  X  €  72 g,  and  P'(Rg)  =  £p'[1{X€7Ib}]  stays  bounded  away  from  zero  as  B  — ►  00. 
Then  7b  =  £y[LnJn]  >  dif(B)P>(Hg)  and  £y[(L„/„)2}  <  (d2f(B))2.  Combining  these 
two  facts  implies  that  the  relative  error  of  75 (p')  remains  bounded  as  B  — ►  00,  i.e., 

Standard  Deviationl-Wo')]  ,  . 

limsup  - 1  v  /J  <  00  (5) 

B— *  00  7 B 

and  we  say  the  importance  sampling  change  of  measure  pf  has  "bounded  relative  error.” 
This  implies  that  only  a  fixed  sample  size  is  required  to  obtain  estimates  of  7b  of  a  given 
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precision,  no  matter  how  rare  Hg  is.  Such  an  estimate  is  also  asymptotically  optimal,  in 
the  sense  that  its  second  moment  approaches  zero  at  the  fastest  possible  rate,  [/(B)]3.  Note 
that  bounded  relative  error  is  also  obtained  if  d\f{B)  <  L(X)  <  Df(B)  for  all  X  €  Hg 
where  D  is  a  random  variable  such  that  £y  [D3]  <  oo. 

To  estimate  the  steady  state  packet  loss  rate,  r,  we  exploit  a  ratio  formula  [5, 1 1]  that  is 
an  extension  of  the  ratio  formula  that  holds  in  regenerative  systems  [12,  38].  (While,  with 
multiple  Markovian  sources  the  model  may  theoretically  regenerative,  the  regenerations  may 
occur  so  infrequently  as  to  make  the  regenerative  method  useless  for  practical  purposes.) 
For  some  subset,  4 ,  of  the  state  space  define  .4-cycles  to  begin  whenever  the  process  enters 
A.  (We  will  choose  A  to  correspond  to  an  empty  buffer.)  Then 

~  E|aj  (6) 

where  Y  is  the  number  of  packets  lost  during  an  .4-cycle  and  a  is  the  number  of  packets  that 
arrive  during  an  4-cycle.  In  Equation  6,  the  initial  distribution  is  the  stationary  distribution 
conditioned  on  the  process  just  entering  A.  Successive  4-cycles  are  identically  distributed, 
but  are  not  independent  as  they  would  be  in  the  regenerative  method.  For  large  buffer  sizes, 
the  event  Tig  =  {F  >  0}  is  a  rare  event;  Hg  is  the  event  that,  starting  with  an  empty  buffer, 
the  buffer  reaches  size  B  before  emptying.  Importance  sampling  can  be  used  to  estimate 
the  numerator,  while  standard  simulation  can  be  used  to  estimate  the  denominator.  In 
particular,  E[Y\  =  E\Y\R,b]P(Rb).  Estimation  of  E\Y]  is  difficult  because  P(TZb)  is  small. 
Therefore  we  will  concentrate  on  asymptotically  optimal  procedures  for  estimating  P(Rb)- 

We  next  review  how  to  estimate  P(Hb)  for  the  case  of  a  switch  of  size  B  without 
priority  classes.  The  model  is  a  discrete  time  queueing  model  having  K  independent  (po¬ 
tentially)  heterogeneous  Markovian  sources.  (Somewhat  more  general  arrival  processes  can 
be  handled;  see  [9,  10].)  The  switch  can  service  up  to  c  (c  >  1)  packets  in  each  time 
slot.  Let  afc(t)  denote  the  number  of  packets  from  source  k  that  arrive  during  time  slot 
t.  Let  a(t)  =  ai(t)  +  ...  +  a/c(t)  denote  the  total  number  of  arrivals  during  time  slot  t, 
4*(t)  =  Ofc(l)  +  ...  +  a*(t)  denote  the  total  number  of  source  k  arrivals  in  the  interval 
[l,t],  and  A(t)  =  4i(t)  +  . . .  +  4/c(t)  denote  the  total  number  of  arrivals  during  the  inter¬ 
val  [l,t].  If  Q(t)  denotes  the  queue  length  at  time  t,  then  Q(t)  is  given  by  Lindley  type 
recursion  Q(t  +  1)  =  [max(Q(t)  +  a(t  +  1),  B)  —  c]+;  this  recursion  takes  into  account  the 
finite  buffer  of  capacity  B.  For  source  k,  there  is  an  environment  variable  X 'k(t)  describing 
the  state  of  the  source.  The  distribution  of  arrivals  has  a  Markovian  structure  given  by 
P*(a,j|i)  =  P(ak(t)  =  a,Xk(t)  =  j\Xk(t  -  1)  =  i)  (0  <  a  <  b,  1  <  i,j  <  M)  where  b  and 
Af  are  finite  constants.  To  apply  importance  sampling,  we  draw  on  large  deviations  results 
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for  Markov  additive  processes.  Define  the  0- conjugate  process  (the  exponentially  twisted, 
or  tilted,  distribution)  with  parameter  0,  as  follows.  Let  A*(0)  be  the  spectral  radius  of  the 
matrix 

(7) 

asO 

and  let  kk(i,9)  be  the  corresponding  eigenvector.  (We  assume  that  Ak<t(i,j)  is  irreducible 
so  that,  by  the  Perron- FYobenius  theorem,  A*(0)  is  real  valued  and  h*(t,0)  >  0.)  Thus 

E  Ak,g(iJ)hk(j,9)  =  Xk(9)hk(i,  0).  (8) 

i= i 

The  twisted  distribution  for  source  k  is  defined  by 

io = p.(«k(i)  =  «,Xk(i) = -!>  =  ;)  =  (9) 

which  is  seen  to  be  a  probability  distribution  by  Equations  7  and  8.  Each  arrival  process 
will  be  simulated  (independently)  using  the  same  twisting  parameter  0  for  some  as  yet 
unspecified  value  of  0.  Because  of  the  form  of  Pki$(a,j\i),  the  likelihood  ratio  after  T 
transitions  has  a  simple  form: 


L(0,T) 


=  n£=i  nLi 


=  n  A*(0)Te-'^T>£$$$ 


=  exp  {-9A(T)  +  T  £?=1  log(Afc(0))}  H (0,  T) 


(10) 


where  H(9,T)  =  nLi  hk(Xk(0),9)/hk(Xk(T),0). 

Suppose  the  queue  is  empty  at  time  0  and  the  event  Kb  occurs  at  time  7b,  i.e.,  the 
queue  length  first  reaches  or  exceeds  level  B  at  time  Tb  and  is  non-empty  in  the  interval 
[1,7b].  Then  there  are  exactly  c  x  Tb  departures  in  [1,7b]  and,  since  Q(Tb)  >  B,  we 
must  have  A(Tb)  >  B  +  cTb-  Let  0(Tb)  be  the  (nonnegative)  “overshoot”  defined  by 
A(Tb)  =  0(Tb)  +  B  +  cTb •  Thus,  by  Equation  10 


L(9,TB)  =  exp  |-0B  -  90(TB)  +  Elog(A *(0))  -  9c  7b  j  H(9,TB).  (11) 


By  setting  0  =  0*  where  0*  satisfies 


v-'  l°g(Afc(0*))  _ 

k=\ 


—  c 


(12) 
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(assuming  it  exists)  results  in  the  further  simplification  that 


L(9\Tb)  =  (13) 

Because  the  state  space  of  each  source  is  finite,  both  H(9*,Tb)  and  0(Tb)  are  bounded. 
Thus,  on  Kb,  there  exist  constants  d\  and  da  such  that 

d*e~rB  <L{f,  TB)<d3e-9'B.  (14) 


Furthermore,  it  can  be  shown  that  the  queue  is  unstable  when  simulated  with  this  value  of 
0*  and  thus  Pa* {Kb)  stays  bounded  away  from  zero.  Thus,  importance  sampling  with  0* 
is  asymptotically  optimal  for  estimating  P{Kb)  and  lima_O0  \og{P{Ke))IB  =  —0*.  Note 
that  while  the  decay  rate  of  P{Kb),  -0*,  is  known,  the  constant  in  front  of  it  is  unknown; 
in  essence,  importance  sampling  is  estimating  this  constant. 

In  this  example,  the  effective  bandwidth  of  source  k  is  known  to  be  aj(0)  =  log(Afc(0))/0 
and  the  effective  bandwidth  of  all  the  sources  is  o*(0)  =  aj(0)  +  . . .  +  a.*K (0).  Note  that 
to  find  the  optimal  importance  sampling  change  of  measure  requires  solving  Equation  12, 
a*(0*)  =  c,  a  nonlinear  equation  involving  the  spectral  radiuses  of  the  sources. 

A  “splitting”  technique  [9,  31]  can  be  used  to  efficiently  estimate  both  E\Y]  and  £[a]. 
The  process  is  simulated  (without  importance  sampling)  until  it  is  approximately  in  steady 
state.  Then  whenever  the  process  enters  A ,  two  A-cycles  are  simulated  using  the  same 


starting  conditions;  one  using  importance  sampling  and  one  without  using  importance  sam¬ 
pling.  These  provide  samples  for  the  ratio  estimate.  Also,  the  A-cycle  simulated  without 
importance  sampling  provides  a  starting  point  (with  approximately  the  steady  state  distri¬ 
bution  on  A)  for  the  next  pair  of  samples.  In  particular,  let  VJ,  a,,  and  Li  be  the  samples  of 
V,  a  and  the  likelihood  ratio,  respectively,  obtained  from  the  i-th  pair  of  A— cycles.  Then, 


the  estimate  of  r  is  given  by 


Url  UYj 

E?=i «.  ‘ 


(15) 


Because  the  A— cycles  are  not  independent  and  identically  distributed,  the  method  of  batch 


means  [4]  can  be  used  for  variance  estimation. 


3  Fast  Simulation  of  the  Shared  Buffer  Switch 

Let  us  now  turn  to  the  simulation  of  the  shared  buffer  switch.  Let  H  denote  the  set  of  indices 
of  high  priority  arrival  streams  and  let  C  denote  the  set  of  indices  of  low  priority  arrival 
streams.  Let  o^(0)  =  J2{ken}  al( &)  h®  the  effective  bandwidth  of  the  high  priority  streams 


and  let  a*c{6)  =  £{*€£}  aI(^)  be  the  effective  bandwidth  of  the  low  priority  streams.  Note 
that  a"(d)  =  a^(8)  +  a£(0). 

We  again  consider  efficient  estimation  of  Tig,  which  is  the  event  that,  starting  empty, 
the  buffer  reaches  size  B  =  B\  +  B?  before  becoming  empty.  This  event  entails  first  reaching 
level  B\  and  then  reaching  B.  On  an  >1- cycle,  let  J  be  the  (random)  number  of  upcrossings 
of  level  B\  \  on  Tig,  J  >  1.  Let  F  be  the  upcrossing  index  on  which  the  buffer  first  reaches 
B,  i.e.,  if  the  queue  first  reaches  B  after  the  j-th  upcrossing  of  B\  but  before  the  (j  +  l)-st 
upcrossing  (assuming  it  exists),  then  F  =  j.  (If  the  buffer  level  B  is  not  hit,  define  F  =  oo.) 
Note  that 

=  13  hrmj}  ~  £  hr=jJ<J}  =  £  (16) 

j=i  j= l  j=i 

since  if  F  =  j  then  j  <  J  is  automatically  satisfied.  Thus,  assuming  the  cycle  is  started 
in  the  stationary  distribution, 

F{Kg)  =  £[f;  l(fw)]  =  f)  E[l(f-=«l  =  f ]P(F  =  j),  (17) 

J=1  J=1  J=1 

i.e.,  52/=i  l{F=i>  *s  an  unbiased  estimate  of  P(Hb).  Note  that,  given  J  >  1,  the  event  that 
J  >  1  may  not  be  rare,  i.e.,  given  at  least  one  upcrossing  of  B\,  there  may  be  many  up- 
crossings  of  B\ .  This  complicates  the  application  of  importance  sampling,  because  once  the 
process  crosses  B\ ,  just  pushing  the  process  immediately  towards  B  (using  an  appropriate 
change  of  measure)  will  not  work  well  (since  doing  so  will  not  produce  accurate  estimates 
of  P(F  =  j)  for  j  >  1). 

One  may  also  look  at  this  in  another  way.  For  each  j,  the  optimal  importance  sampling 
distribution  for  estimating  P(F  =  j )  is  different.  Also  there  are  infinitely  many  P(F  =  j)’s, 
so  one  cannot  run  a  different  simulation  for  estimating  each  of  the  P(F  =  j)'s.  The 
technique  described  below  provides  a  way  out  of  this  problem. 

We  proceed  as  follows.  We  first  use  importance  sampling  to  move  the  system  up  to 
level  B\.  When  B\  is  crossed  we  turn  off  importance  sampling  until  the  end  of  the  A-cycle 
is  reached.  We  call  this  the  base  path.  This  base  path  is  used  to  generate  samples  of  J , 
the  number  of  upcrossings  of  B\.  Whenever  the  base  path  crosses  B\  (including  the  first 
upcrossing),  we  “split”  the  base  path.  On  upcrossing  j  ( j  =  1,. . .,  J)  importance  sampling 
is  applied  on  the  “j'-th  split  path”  to  try  to  bring  the  system  up  to  level  B  during  this  j'-th 
upcrossing  of  B\.  The  initial  conditions  of  the  y-th  split  path  are  the  same  as  that  of  the 
base  path  just  after  it  crosses  B\  for  the  j- th  time.  The  j-th  split  path  is  simulated  until 
either  B  is  hit  or  the  queue  length  drops  below  B\,  whichever  comes  first.  This  splitting  of 
the  base  path  is  illustrated  in  Figure  1. 
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Let  Lq  denote  the  likelihood  ratio  during  the  buildup  from  0  to  Bi  and  Lj  denote  the 
likelihood  ratio  during  the  j- th  split  path.  Define 

J  oo  oo 

Z  =  EWl*,^}  =  T,L°L>l<K,nC,}l<j<J}  =  E  LoLjl^nc,}  (18) 

i= l  i= i  i=i 

where  fty  is  the  event  that  the  j-th  split  path  hits  B  before  downcrossing  B\  and  Cj  is 
the  event  that  the  base  cycle  has  not  hit  B  before  the  start  of  the  j-th  split  path,  i.e., 
Cj  =  {F  >  j}  fl  {j  <  J}.  With  this  notation,  we  associate  the  random  variables  F  and 
J  only  with  the  base  path,  and  not  the  split  paths.  Then  Lof'jl^nc,}  is  an  unbiased 
estimate  of  P(F  =  j).  This  is  because,  the  part  of  base  path  from  the  beginning  of  the 
A-  cycle  until  the  j-th  split,  combined  with  the  j-th  split  path,  may  be  viewed  as  a  new 
base  path  (that  also  has  J  >  j)  that  has  been  generated  using  the  following  change  of 
measure:  at  the  beginning  of  the  A— cycle,  push  the  the  process  towards  B\  and  then  wait 
until  B\  is  upcrossed  j  —  1  times  more  before  pushing  it  towards  B.  Denote  this  change  of 
measure  by  py.  The  random  variable  1  defined  on  the  original  A— cycle  with  split 

paths,  is  equivalent  to  the  random  variable  1{F=J}  defined  on  this  new  base  path.  Then  we 
get  unbiasedness  from  the  fact  that 

E,;(£oM<F.»]  =  £[l(F=i)J  =  P(F  =  j).  (19) 

Thus,  Z  is  an  unbiased  estimate  of  P(7Zb)-  From  Equation  18  we  see  that  we  need  not 
generate  split  paths  off  the  base  path  if  the  base  path  has  already  hit  level  B. 

We  now  turn  to  the  specifics  of  how  importance  sampling  is  applied.  On  the  build-up 
from  0  to  B\,  the  queue  behaves  identically  to  the  queue  analyzed  in  Section  2.  Thus 
an  asymptotically  optimal  estimate  of  P(J  >1)  (hitting  B\)  is  obtained  by  exponential 
twisting  of  both  the  high  and  low  priority  arrival  streams  with  parameter  defined  to  be 
the  solution  of 

o*(0?)  =  o^)  +  a*cW)  =  c,  (20) 

assuming  such  a  exists.  (Assuming  it  exists,  9\  >  0  since  the  queue  is  stable.)  In 
particular,  there  exist  constants  d\  and  di  such  that 

dxe~9'Bi  <  L0  <  d2e~9*Bl.  (21) 

Similarly,  during  the  j-th  upcrossing  of  B\ ,  the  system  behaves  like  a  queue  with  only  high 
priority  arrivals  building  up  from  some  level  Oj  to  B2  (without  going  below  0)  where  Oj  is 
the  overshoot  on  the  j-th  upcrossing  of  B\ .  Since  the  sources  are  finite,  there  exists  some 
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constant  d  such  that  Oj  <  d.  Thus  exponential  twisting  of  the  high  priority  streams  with 
parameter  0%  defined  to  be  the  solution  (assuming  it  exists)  of 

a«(®3)  =  c  (22) 

should  be  effective  for  large  B 2.  (Again,  assuming  it  exists,  0%  >  0  since  the  queue  is  stable.) 
During  this  phase,  low  priority  streams  are  simulated  from  their  given  distributions  (and 
therefore  do  not  contribute  to  the  likelihood  ratio).  To  make  this  more  precise,  if  the  time 
axis  is  renumbered  so  that  the  j- th  upcrossing  of  B\  occurs  at  time  0  and  level  B\  +  B2  is 
first  hit  on  this  up-crossing  at  time  Tb,  then  the  number  of  high  priority  arrivals  in  [1,7b] 
is  between  B2  -  d  +  Tbc  and  B2  +  Tgc  +  d,  corresponding  to  the  bounding  cases  that  (i) 
Oj  =  d  and  the  overshoot  upon  hitting  B  is  zero,  and  (ii)  Oj  =  0  and  the  overshoot  upon 
hitting  B  is  d  (its  maximum),  respectively.  Thus  when  0  =  02,  there  exist  constants  d3  and 
d4  such  that  on  TZj  fl  Cj 

d3e~9*Bi  <  Lj  <  d4e~9*Bl.  (23) 

In  addition,  the  queue  is  unstable  when  simulated  with  at  0 J.  Therefore,  by  Equation  18, 
the  importance  sampling  estimate  (from  a  single  A -cycle)  satisfies 

<  Z  <  d2d4Je-em>B'-e;B>.  (24) 

The  event  n  C\  is  simply  the  event  that  the  base  path  hits  B\  and  the  first  split  path 
hits  B.  Thus 

P(nB)  >  dxd3Peit6.{n4  D Cl)e-91B'-0'>B\  (25) 

Note  that  C\  =  {J  >  1},  so  Pe*,8*{Tl\  HCi)  =  P$i(T^\\J  >  1  )Pe*(J  >  1).  Under  importance 
sampling  Pe*(Tlx\J  >  1)  and  P$’(J  >  1)  are  both  bounded  away  from  zero,  since,  as 
described  above,  they  both  represent  the  probability  that  an  appropriate  unstable  queue 
reaches  some  queue  length  before  returning  to  zero.  (Note  that  the  distribution  of  J  depends 
only  on  0 J,  since  the  base  path  turns  importance  sampling  off  after  B\  is  crossed  for  the 
first  time.)  Similarly,  by  Equation  24, 

Eex,9l[Z2)  <  [d2d4]2Ee;[J2}e-29 (26) 

Thus  if,  Eg*[J2]  <  00,  the  importance  sampling  estimate  of  P(71b)  has  bounded  relative 
error  (and  is  thus  asymptotically  optimal).  It  is  intuitively  clear  that  this  must  be  so,  since 
(J  -  1)  is  the  number  of  upcrossings  of  level  B\  of  a  queue  with  negative  drift  started  just 
above  level  B\.  A  proof  that  E$>[J2}  <  00  is  given  in  Appendix  A.  To  formalize  this  result, 
we  state  the  following  theorem. 
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Theorem  1  Let  B\  =  fB  and  B2  =  ( 1  -  f)B  for  some  constant  f,  0  <  /  <  1.  If  there 
exist  constants  0J  >  0  and  9^  >  0  such  that  +  amc{9 J)  =  c  and  o^^)  =  c,  then 


Um  = 


-rxf-r^\-D 


lim 

B-»oo 


\og(Ew'Z*)) 

B 


=  -2Pxf  -2^(1-/) 


(27) 

(28) 


i.e.,  the  importance  sampling  estimate  is  asymptotically  optimal  and  has  bounded  relative 
error. 


4  Experimental  Results 

In  this  section  we  will  present  experimental  results  using  the  importance  sampling  algorithm 
described  in  the  previous  section.  As  is  Section  2  define  an  A-cycle  to  begin  whenever  the 
process  enters  the  set  of  states  where  the  buffer  is  empty.  Also,  if  a  denotes  the  number 
of  high  priority  arrivals  in  a  typical  A-cycle  and  Y  denotes  the  number  of  high  priority 
packets  lost  in  an  A-cycle,  then  the  long  run  fraction  of  high  priority  packets  lost  is  given 
by  Equation  6. 

As  mentioned  in  Section  2,  asymptotically  optimal  changes  of  measure  for  estimating 
P(TZb)  (where  now  B  =  B\+B2)  also  work  well  for  estimating  E[Y],  However  the  simulation 
procedure  differs  slightly.  Similar  to  Equation  17,  define 

J 

W  =  '£tL0LjNjl{1l}nci)  (29) 

j=i 

where  N}  is  the  number  of  high  priority  packets  lost  in  the  j-th  split  path  until  it  hits 
the  buffer  empty  state.  Then  W  is  an  unbiased  estimate  (from  a  single  A-cycle)  of  E[Y]. 
Unlike  the  simulation  for  P{TLb)  (where  we  stop  simulating  the  j-th  split  path  once  it  hits 
level  B  or  downcrosses  level  B\  )  if  the  j-th  split  path  hits  B  before  it  downcrosses  level  Hj , 
then  we  have  to  simulate  it  until  the  buffer  empty  state  (in  order  to  get  a  sample  of  Nj). 

We  adopt  the  following  simulation  procedure.  We  first  run  enough  A-cycles  so  that 
the  process  (approximately)  reaches  steady  state.  Then  we  start  collecting  samples  of  W 
and  a  using  the  splitting  technique  referred  to  at  the  end  of  Section  2,  i.e.,  whenever  the 
process  enters  A  we  generate  two  parallel  cycles,  one  with  importance  sampling  and  the 
other  without  importance  sampling.  The  one  with  importance  sampling  is  used  to  generate 
samples  of  W  and  the  one  without  importance  sampling  is  used  to  generate  samples  of  a 
as  well  as  to  get  a  starting  point  for  the  next  pair  of  A-cycles.  From  n  such  sets  of  parallel 
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cycles,  samples  W\ , . . . ,  Wn  and  aj , . . . ,  an  of  W  and  a,  respectively,  are  obtained.  Then  r  is 
estimated  by  r„(^J,  )  =  £"=1  W,/  <*«'•  However,  since  the  successive  Wi' s  (  and  a,’s) 

are  not  independent,  the  method  of  batch  means  is  used  to  construct  confidence  intervals. 
We  will  now  briefly  review  this  procedure. 

Divide  Wi , . . . ,  Wn  and  c*i, . .  .,an  into  6  batches  of  size  k  =  n/6.  (Assume  that  n/6 
is  an  integer.)  Let  =  Yl)k=(i-\)k+'i  Wj/k  and  7,-  =  J3jl(«-i)fc+i  ai/^i  i-e->  these  are  the 
sample  means  corresponding  to  the  tth  batch.  The  method  of  batch  means  works  on  the 
assumption  that  for  k  sufficiently  large,  the  successive  6j's  and  7,’s  are  (approximately) 
normally  distributed  and  independent.  Hence  if  we  estimate  r  by  f  =  &%/ 7« 

(this  is  the  same  estimator  as  when  we  did  not  use  batch  means),  then,  if  6  is  also  large, 
Vb{r  —  r)  «  JV(0,<r2)  where 

_2  _  VarN  +  2rCov[^,7i]  +  r2Var[7j] 

" - m - •  (30) 

Then  an  approximate  99%  confidence  interval  can  be  constructed  as  (r  —  2.56 y/(r2/b,r  + 
2.56  s/a* /h).  The  relative  error  RE  is  thus  defined  to  be  2.56 ^/(<r2/6)/r.  In  practice,  since 
a2  is  unknown,  it  must  be  estimated,  which  can  be  done  using  the  standard  estimators  of 
all  the  quantities  in  Equation  30. 

We  now  consider  two  examples.  In  the  first  example  there  are  four  input  sources,  three 
of  which  are  high  priority  and  one  of  which  is  low  priority.  Each  source  is  a  two  state 
Markov  modulated  process,  with  a  deterministic  number  of  arrivals  in  each  state;  in  state  1 
there  is  no  arrival  and  in  state  2  there  is  exactly  1  arrival.  Let  denote  the  the  transition 
matrix  of  the  state  of  the  fcth  source,  i.e.,  p-^  =  P(Xfc(t)  =  j\Xk{t-  1)  =  i).  For  1  <  k  <  3, 
p[kJ  =  0.7  and  P22  =  0.5;  for  k  =  4,  p^  =  0.9  and  p^  =  0.5.  (This  notation  can  be  easily 
mapped  to  the  notation  of  Section  2.)  Note  that  the  individual  input  streams  are  positively 
correlated,  i.e.,  Pn*  +  P22  >  1  for  aU  k.  The  total  high  priority  and  low  priority  arrival  rate 
can  be  computed  to  be  1.125  and  0.167  respectively.  We  choose  the  switch  speed  c  =  2. 
For  such  two  state  Markov  processes,  the  appropriate  Eigenvalues  are  known  in  closed  form 
(see,  e.g.,  [9,  10]).  Using  Equation  20  and  Equation  22,  the  9\  and  were  computed  to  be 
0.75  and  2.00,  respectively. 

We  ran  experiments  for  several  values  of  B\  and  2?2;  we  fixed  /  =  0.5  and  varied  B. 
Note  that  the  Markov  chain  model  for  this  example  will  have  8 (B  +  1)  states  and  it  is  not 
very  difficult  to  solve  this  particular  model  numerically  (for  moderate  values  of  B ).  We  use 
this  model  mainly  as  a  means  of  comparing  the  effectiveness  of  our  importance  sampling 
algorithm  to  that  of  standard  simulation;  for  moderate  values  of  B ,  the  loss  rates  can  be 
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accurately  estimated  within  a  reasonable  amount  of  time  using  standard  simulation.  In 
each  experiment  (using  importance  sampling)  we  first  simulated  300  consecutive  ,4-cycles 
so  that  the  process  (approximately)  reaches  steady  state.  Then  we  simulated  for  60,000  bi¬ 
cycle  pairs.  We  used  b  =  2,000  batches  with  30  4-cycle  pairs  in  each  batch.  (The  variance 
estimate  was  fairly  insensitive  to  the  batch  size  for  the  same  total  number  of  4-cycles).  For 
the  sake  of  completeness,  we  also  estimated  the  loss  probabilities  (rj )  of  the  low  priority 
packets  using  the  same  simulation  runs;  we  used  the  4-cycles  for  the  estimation  of  the 
denominator  (in  Equation  6)  to  get  samples  of  the  total  number  of  low  priority  arrivals  in 
an  4-cycle  and  we  used  the  base  sample  path  of  the  cycle  used  for  the  estimation  of  the 
numerator,  to  get  samples  of  the  number  of  low  priority  arrivals  lost  during  an  ,4-cycle. 

Results  for  these  experiments  are  presented  in  Table  1.  Note  how  the  RE  using  impor¬ 
tance  sampling  remains  almost  constant  as  B  becomes  larger.  For  comparison  purposes  we 
also  ran  standard  simulations  for  the  same  CPU  time  that  was  used  for  the  importance  sam¬ 
pling  case.  By  standard  simulation  we  mean  that  (after  discarding  the  first  300  /I -cycles) 
at  each  entrance  to  the  set  4,  instead  of  simulating  a  pair  of  .4-cycles,  we  only  simulate  the 
4- cycle  with  no  importance  sampling.  In  addition  to  getting  samples  of  the  total  number 
of  arrivals  of  each  priority  level  from  this  4-cycle,  we  also  obtain  samples  of  the  number 
lost.  For  a  given  number  of  samples,  i.e,  pair  of  numerator/denominator  4-cycle  samples, 
importance  sampling  requires  twice  the  number  of  4 -cycle  simulations  as  compared  to 
standard  simulation.  Also,  an  4-cycle  with  importance  sampling  is  more  costly  to  simu¬ 
late  due  to  the  split  paths  and  the  computation  of  the  likelihood  ratios.  However,  as  we 
can  see  in  Table  1,  these  inefficiencies  in  the  importance  sampling  method  are  insignificant 
as  compared  to  the  speedup  gained  due  to  the  variance  reduction.  Note  that  with  standard 
simulation,  in  many  cases,  we  do  not  even  get  any  samples  of  buffer  overflow  in  an  4— cycle. 
(In  the  tables,  such  cases  are  indicated  by  a  ±?.  The  superscript  *  in  some  of  the  table 
entries  means  that  the  estimate  had  not  yet  stabilized.) 

To  compute  speedup,  we  need  to  compute  the  ratio  of  the  CPU  time  required  by  standard 
simulation  to  achieve  a  given  level  of  accuracy  to  the  CPU  time  required  by  importance 
sampling  to  achieve  the  same  level  of  accuracy.  Due  to  the  large  amount  of  CPU  time  it 
took  for  the  standard  simulation  estimates  to  stabilize,  we  were  not  able  to  compute  the 
speedup  for  all  cases,  although  this  was  possible  for  B  =  6  and  B  —  8.  For  Z?  =  6,  for  the 
same  amount  of  CPU  time,  the  RE  of  the  standard  simulation  estimate  (for  r)  is  about 
6.7  (=20.2/3)  times  larger  than  that  of  the  importance  sampling  estimate.  Since  the  RE 
decreases  at  rate  1  /  i/sample  size,  the  standard  simulation  would  have  to  be  run  about  45 
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(«  6.72)  times  longer  in  order  to  achieve  the  same  RE  as  using  importance  sampling.  Thus 
the  B  =  6  speedup  is  about  45.  For  B  =  8,  the  variance  estimate  using  standard  simulation 
had  not  yet  stabilized.  For  this  case,  we  continued  the  standard  simulation  run  until  we 
obtained  the  same  RE  (±3.2%)  as  that  obtained  in  Table  1  using  importance  sampling.  The 
CPU  time  required  to  achieve  this  same  accuracy  was  about  450  times  greater  than  that 
measured  for  importance  sampling,  i.e.,  the  B  =  8  speedup  is  about  450.  (The  standard 
simulation  estimate  of  r  was  3.13  x  10-5  ±  3.2%  compared  to  3.11  x  10-5  ±  3.2%  using 
importance  sampling.)  As  r  decreases,  the  speedup  using  importance  sampling  increases, 
although  the  speedup  becomes  more  difficult  to  compute. 

Next  we  consider  a  larger  example  with  16  two-state  Markov  modulated  input  sources 
(of  the  same  type  as  in  the  previous  example).  The  pn’s  and  pa's  of  the  16  sources  were 


(0.7,0.6,06,0.5,0.8,0.8,0.8,0.8,0.8,0.7,0.7,0.7,0.9,0.9,0.9,0.9) 


and 


(0.5, 0.8, 0.6, 0.9, 0.3, 0.6, 0.6, 0.4, 0.8, 0.5, 0.9, 0.6, 0.7, 0.8, 0.6, 0.5), 


respectively.  The  first  twelve  sources  are  considered  to  be  of  high  priority  with  a  total 
arrival  rate  of  5.57  and  the  last  four  sources  are  considered  to  be  of  low  priority  with  a 
total  arrival  rate  of  0.95.  The  switch  speed  c  was  chosen  to  be  8.  Using  Equation  20  and 
Equations  22  the  6*  and  0%  were  computed  as  0.34375  and  1.0625,  respectively. 

We  again  estimated  loss  probabilities  for  several  values  of  B  and  a  fixed  /.  Note  that 
the  Markov  chain  model  for  this  example  will  have  approximately  216(fl  +1)  states  and 
hence  it  is  very  difficult  to  compute  the  loss  probabilities  numerically.  Therefore,  simulation 
using  importance  sampling  is  very  useful.  The  initial  transient  deletion  period  was  again 
300  A-cycles.  The  simulation  was  run  for  150,000  A-cycle  pairs,  i.e.,  5,000  batches  with  30 
A-cycle  pairs  in  each  batch.  Notice  again  how  the  RE  using  importance  sampling  remains 
bounded  as  B  — ►  oo  whereas  with  standard  simulation,  in  all  cases,  we  did  not  observe 
any  high  priority  packet  losses.  With  importance  sampling,  loss  rates  as  low  as  10-23  were 
estimated  in  a  reasonable  amount  of  time  with  a  high  degree  of  accuracy. 


5  Conclusions 

This  paper  has  considered  the  application  of  importance  sampling  to  estimating  the  packet 
loss  rate  in  an  ATM  communications  switch  having  a  shared  buffer  and  two  priority  classes. 
When  the  buffer  contents  reach  a  threshold  level,  B\ ,  low  priority  packets  are  rejected.  When 
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the  buffer  contents  reaches  its  capacity,  B\  +  B?,  high  priority  packets  are  rejected.  The 
object  of  the  analysis  is  to  estimate  the  loss  rate  of  high  priority  packets.  An  asymptotically 
optimal  importance  sampling  approach  was  described.  Experiments  using  this  approach 
showed  that  many  orders  of  magnitude  speedup,  as  compared  to  standard  simulation,  can 
be  obtained.  The  speedup  increases  as  the  packet  loss  rate  decreases. 

This  importance  sampling  approach  is  unusual  in  several  respects: 

1.  The  rare  event  of  interest  occurs  because  of  the  occurrence  of  two  different  events, 
each  of  which  is  rare.  Importance  sampling  has  to  be  done  so  as  to  make  each  of  these 
rare  events  happen,  one  after  the  other.  Since  the  best  change  of  measure  is  different 
for  each  of  these  rare  events,  the  simulation  program  has  to  shift  from  one  importance 
sampling  strategy  to  another  at  appropriate  instances  in  time. 

2.  The  situation  is  further  complicated  by  the  fact  that  the  second  rare  event  may  not 
happen  immediately  after  the  first  rare  event.  As  a  result,  a  more  complicated  “split¬ 
ting”  procedure  needs  to  be  introduced  so  as  to  get  an  efficient  overall  simulation 
procedure. 

Based  on  the  analysis,  it  is  clear  that  this  approach  can  be  extended  to  the  case  when 
there  are  more  than  two  priority  levels,  e.g.,  three  priority  levels  with  an  admission  policy 
determined  by  three  threshold  levels,  B\,  B\  +  B2,  and  B^  +  B?  +  B3.  There  would  be  three 
different  values  of  9,  0\,  0J  and  0%,  with  which  to  do  importance  sampling.  Exponential 
twisting  of  the  high,  medium  and  low  priority  streams  with  parameter  9\  satisfying  a^(0J)  + 
+  a/rW)  =  c  is  done  initially  at  the  start  of  an  A-cycle  until  B\  is  reached.  Then 
exponential  twisting  of  the  high  and  medium  priority  streams  with  parameter  9\  satisfying 
2)  +  ®3U(* 2)  =  c  is  done  until  B\  +  is  reached.  Finally,  exponential  twisting  of  the 
high  priority  streams  with  parameter  9$  satisfying  0^(05)  =  c  is  done  until  B\  +  B?  +  B3 
is  reached.  In  addition,  a  multi-level  splitting  procedure  needs  to  be  applied;  the  j-th 
split  path  obtained  on  the  j'-th  upcrossing  of  B\  also  needs  to  be  split  upon  upcrossings  of 
B\  +  i?2-  This  results  in  a  rather  complicated  simulation  algorithm. 

Other  extensions  are  not  so  clear.  For  example,  suppose  when  the  buffer  reaches  level 
B\ ,  the  switch  displaces  low  priority  packets  already  in  the  buffer  with  high  priority  arriving 
packets.  Now  the  change  of  measure  required  to  move  the  buffer  from  level  B\  to  B\  + 
is  not  obvious. 

This  paper  thus  illustrates  both  the  potential  and  limitations  of  importance  sampling. 
In  certain  applications,  there  is  enough  structure  so  that  highly  effective  importance  sam- 
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pUng  schemes  can  be  devised.  In  such  cases,  importance  sampling  results  in  spectacular 
improvements  over  standard  simulation.  However,  while  some  applications  of  genuine  inter¬ 
est,  such  as  the  one  considered  here,  can  be  handled  optimally,  the  class  of  such  applications 
is  rather  narrow.  Furthermore,  small  changes  in  the  problem  structure  can  easily  lead  to 
models  for  which  the  optimal,  or  even  a  good,  importance  sampling  change  of  measure  is 
unknown. 

A  Appendix  A 

In  this  appendix  we  prove  E$»[J 2]  <  oo.  Our  approach  is  based  on  a  sequence  of  stochastic 
comparisons.  We  note  that  J  -  1  is  bounded  above  by  the  last  time  that  the  queue  is 
below  in  an  i4-cycle  when  started  from  the  level  B\  +  d,  (where  d  is  the  bound  for  the 
overshoot  when  level  B\  is  upcrossed).  Denote  this  last  time  by  T\.  Now  we  consider  an 
auxiliary  queue  in  which  the  lower  priority  arrival  streams  are  allowed  to  be  stored  in  the 
buffer  even  when  the  buffer  reaches  the  level  B\ .  Let  r2  be  the  last  time  that  the  auxiliary 
queue  is  below  B\  in  an  ^4-cycle  when  started  from  B\  +  d.  Since  there  are  more  arrivals 
in  the  auxiliary  queue  than  the  original  queue,  it  then  follows  from  a  standard  sample  path 
argument  that  r2  is  stochastically  larger  than  or  equal  to  Tj,  i.e.,  r2  >s(  t\.  Now  we  remove 
both  boundaries  at  levels  B  and  0  of  the  auxiliary  queue  and  consider  the  corresponding 
random  walk  started  from  the  same  level  B\  +  d.  Let  r3  be  the  last  time  that  the  random 
walk  is  below  level  B\ .  Clearly,  r3  >  r2  since  there  are  no  losses  in  the  random  walk  and 
there  are  sample  paths  that  the  random  walk  might  reach  level  0  and  then  bounce  back  to 
level  B\.  Now  we  will  show  that  the  distribution  of  r3  is  bounded  by  an  exponential  tail  and 
hence  all  the  moments  exist.  Via  the  stochastic  comparisons  made  above,  all  the  moments 
of  J  —  1  exist  and  Eg»[J2\  <  oo. 

To  show  that  the  tall  distribution  of  r3  is  bounded  exponentially,  let  /4(f)  denote  the 
total  number  of  arrivals  in  the  interval  [1,  *].  Thus,  the  level  of  the  random  walk  at  time  t  is 
A(i)-ct+ Bi+d.  FVom  the  definition  for  the  last  time,  P(r3  >  s)  =  P(maxi>,[/l(f)-cf+d]  > 
0).  Note  that  for  all  c  >  0  and  6  >  0 

P(majc[.A(f)  -  ct  +  d]  >  0)  =  P(max[/i(f)  -  ct  +  d  +  ft  -  et]  >  0)  (31) 

<  P(max[i4(f)  -  (c  -  f)t  +  d]  >  ts)  (32) 

<  ^(inax[A(t)  -  (c  -  ()t  +  d]  >  fs)  (33) 

=  P  ^nyuc[exp(0(/l(<)  -  (c  -  ()t  +  d))]  >  exp(0cs)j  .  (34) 
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Applying  Markov’s  inequality  and  replacing  the  maximum  by  the  sum  yields 


P(r3  >  a)  <  e-ttMe,d  £  (35) 

00 

From  Equation  10,  it  follows  that 

E[e**M-(e-«W]  =  (36) 

=  E’#[ir(M)e(**(#)"(e-€))‘]  (37) 

Since  the  state  space  of  each  source  is  finite,  H(9,t )  is  bounded.  Thus,  there  exists  a 
constant  d2(0)  <  00  for  6  >  0  such  that  H(0,t)  <  d2(0).  Also,  since  0J  >  0  is  the  solution 
of  a*(0)  =  c,  for  any  e  sufficiently  small,  there  exists  a  0  <0\  such  that  <  c  -  2t. 

Thus, 

<  d2{ef)e-Vtt .  (38) 

In  conjunction  with  (35),  one  has 

P{r3  >*)<  (39) 

where 

<K«,)  =  d2(0')et‘ld-t\l-e-9't)-1.a  (40) 


B 

/ 

Quantity 

Estimated 

Importance  Sampling 
Estimate 

Standard  Simulation 
Estimate 

6 

0.5 

r 

4.34  x  10~“  ±  3.0% 

4.72  x  10"“  ±  20.2% 

n 

3.72  x  10-3  ±  4.7% 

3.92  x  10"s±  6.1% 

8 

0.5 

r 

3.11  x  10~8  ±  3.2% 

4.19  x  10-a±  68.4%* 

ri 

1.59  x  10-a±  4.7% 

1.73  x  10-2  ±  8.9% 

20 

0.5 

• 

5.47  x  10~l3  ±  3.8% 

0.00±? 

9.53  x  10~4  ±  5.1% 

12.98  x  10"*  ±  70.8%* 

30 

0.5 

r 

1  24  x  10~‘r  ±  4.4% 

0.00±? 

ri 

1.2S  x  10~*±  5.0% 

0.00±? 

40 

0.5 

r 

3.02  x  lG*'3  ±  4.8% 

0.00±?  j| 

ri 

J.79  x  10"°  ±5.2% 

0.00±? 

Table  1:  Estimates  of  steady-state  loss  probabilities  in  the  small  example.  Importance 
sampling  and  standard  simulation  were  run  for  the  same  amount  of  CPU  time. 
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B 


Quantity 

Estimated 


Importance  Sampling  Standard  Simulation 
Estimate  Estimate 


1/3 

r 

1.13  x  10-7  ±  4.4% 

0.00±? 

ri 

1.15  x  10"2±  2.2% 

1.17  x  10-2  ±  3.9% 

ism 

1/3 

r 

8.63  x  10'io±4.8% 

0.00±? 

ri 

5.74  x  10-3  ±  2.2% 

5.96  x  10-3  ±  5.1% 

Id 

■mi 

t 

4.32  x  10"IS  ±  7.6% 

0.00±? 

L J 

r\ 

1.04  x  10~3±  2.2% 

1.06  x  10-3±  10.0% 

lllSfll 

1/3 

r 

2.15  x  lO'20  ±  4.9% 

0.00±? 

ri 

1.94  x  10“^  ±  2.2% 

2.01  x  lO-^i  19.8% 

Table  2:  Estimates  of  steady-state  loss  probabilities  in  the  large  example.  Importance 
sampling  and  standard  simulation  were  run  for  the  same  amount  of  CPU  time. 
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